A Data Reorganization Technique for Improving Data Locality of Irregular Applications in Software Distributed Shared Memory
نویسندگان
چکیده
Irregular applications are characterized by highly irregular and ne-grained data referencing patterns. When there is poor locality between the ne-grained data, serious false sharing can occur which has largely contributed to poor performance of irregular applications on page-based software distributed shared memory (DSM) systems. Partitioning data in irregular applications to improve data locality has been routinely practiced in message passing programming to reduce inter-processor communication, but data reordering is often ignored in shared memory programming. In this paper we show that data reordering is also important in programming irregular applications under software DSMs, in particular because of the false sharing eeect. We describe a simple yet highly eeec-tive data reordering technique which can signiicantly improve data locality and reduce false sharing in many irregular applications. We evaluate the eeectiveness of this technique on a set of ve irregular applications under both Princeton's home-based protocol and Rice's TreadMarks protocol on a 16-processor platform. We nd that with a simple data ordering during the initial-ization, the performance of these irregular applications is improved by 30% { 366% under TreadMarks and 14%-269% under the home-based protocol.
منابع مشابه
Improving Compiler and Run-Time Support for Irregular Reductions Using Local Writes
Current compilers for distributed-memory multiprocessors parallelize irregular reductions either by generating calls to sophisticated run-time systems (CHAOS) or by relying on replicated buuers and the shared-memory interface supported by software DSMs (TreadMarks). We introduce LocalWrite, a new technique for parallelizing irregular reductions based on the owner-computes rule. It eliminates th...
متن کاملEfficient compiler and run-time support for parallel irregular reductions
Many scienti®c applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buers, then combined using synchronization. We develop LOCALWRITE, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, eli...
متن کاملcient Compiler and Run - Time Support for ParallelIrregular
Many scientic applications are comprised of irregular reductions on large data sets. In shared-memory parallel programs, these irregular reductions are typically computed in parallel using replicated buuers, then combined using synchronization. We develop LocalWrite, a new technique which partitions irregular reductions so that each processor computes values only for locally assigned data, elim...
متن کاملParallelizing Irregular Applications through the YAPPA Compilation Framework
Modern High Performance Computing (HPC) clusters are composed of hundred of nodes integrating multicore processors with advanced cache hierarchies. These systems can reach several petaflops of peak performance, but are optimized for floating point intensive applications, and regular, localizable data structures. The network interconnection of these systems is optimized for bulk, synchronous tra...
متن کاملDynamic shared data in structured parallel programming frameworks
This work originates from the wish to simplify the coding of irregular applications within structured parallel programming environments. In these environments parallelism is exploited by composing “skeletons”, i.e. parallelism exploitation patterns. The skeletal approach has been proved to be effective, at least if application algorithms can be somehow expressed in terms of skeleton composition...
متن کامل